5 research outputs found

    Características de tiempo-frecuencia para la estimación de la posición de los órganos articuladores en consonantes explosivas

    Get PDF
    Acoustic-to-Articulatory inversion offers new perspectives and interesting applicationsin the speech processing field; however, it remains an open issue. This paper presents a method to estimate the distribution of the articulatory informationcontained in the stop consonants’ acoustics, whose parametrizationis achieved by using the wavelet packet transform. The main focus is on measuringthe relevant acoustic information, in terms of statistical association, forthe inference of the position of critical articulators involved in stop consonantsproduction. The rank correlation Kendall coefficient is used as the relevance measure. The maps of relevant time–frequency features are calculated for theMOCHA–TIMIT database; from which, stop consonants are extracted andanalysed. The proposed method obtains a set of time–frequency components closely related to articulatory phenemenon, which offers a deeper understanding into the relationship between the articulatory and acoustical phenomena.The relevant maps are tested into an acoustic–to–articulatory mapping systembased on Gaussian mixture models, where it is shown they are suitable for improvingthe performance of such a systems over stop consonants. The method could be extended to other manner of articulation categories, e.g. fricatives,in order to adapt present method to acoustic-to-articulatory mapping systemsover whole speech.La inversión acústica a articulación ofrece nuevas perspectivas y aplicaciones interesantes en el campo del procesamiento del habla; Sin embargo, sigue siendo un tema abierto. Este artículo presenta un método para estimar la distribución de la información articulatoria contenida en la acústica de las consonantes de parada, cuya parametrización se logra utilizando la transformación del paquete wavelet. El enfoque principal está en medir la información acústica relevante, en términos de asociación estadística, para la inferencia de la posición de los articuladores críticos involucrados en la producción de consonantes de parada. El coeficiente de Kendall de correlación de rango se utiliza como medida de relevancia. Los mapas de las características relevantes de tiempo-frecuencia se calculan para la base de datos MOCHA-TIMIT; de donde se extraen las consonantes y se analizan. El método propuesto obtiene un conjunto de componentes de frecuencia de tiempo estrechamente relacionados con el fenómeno de articulación, que ofrece una comprensión más profunda de la relación entre los fenómenos articulatorio y acústico. Los mapas relevantes se prueban en un sistema de mapeo acústico-articulatorio basado en modelos de mezcla gaussiana , donde se muestra que son adecuados para mejorar el rendimiento de tales sistemas sobre las consonantes de parada. El método podría extenderse a otro tipo de categorías de articulación, p. Ej. fricativas, con el fin de adaptar el método actual al sistema de mapeo acústico a articulatorio en todo el discurso

    Multi-atlas label fusion by using supervised local weighting for brain image segmentation

    Get PDF
    La segmentación automática de estructuras de interés en imágenes de resonancia magnética cerebral requiere esfuerzos significantes, debido a las formas complicadas, el bajo contraste y la variabilidad anatómica. Un aspecto que reduce el desempeño de la segmentación basada en múltiples atlas es la suposición de correspondencias uno-a-uno entre los voxeles objetivo y los del atlas. Para mejorar el desempeño de la segmentación, las metodologías de fusión de etiquetas incluyen información espacial y de intensidad a través de estrategias de votación ponderada a nivel de voxel. Aunque los pesos se calculan para un conjunto de atlas predefinido, estos no son muy eficientes en etiquetar estructuras intrincadas, ya que la mayoría de las formas de los tejidos no se distribuyen uniformemente en las imágenes. Este artículo propone una metodología de extracción de características a nivel de voxel basado en la combinación lineal de las intensidades de un parche. Hasta el momento, este es el primer intento de extraer características locales maximizando la función de alineamiento de kernel centralizado, buscando construir representaciones discriminativas, superar la complejidad de las estructuras, y reducir la influencia de los artefactos. Para validar los resultados, la estrategia de segmentación propuesta se compara contra la segmentación Bayesiana y la fusión de etiquetas basada en parches en tres bases de datos diferentes. Respecto del índice de similitud Dice, nuestra propuesta alcanza el más alto acierto (90.3% en promedio) con suficiente robusticidad ante los artefactos y respetabilidad apropiada.The automatic segmentation of interest structures is devoted to the morphological analysis of brain magnetic resonance imaging volumes. It demands significant efforts due to its complicated shapes and since it lacks contrast between tissues and intersubject anatomical variability. One aspect that reduces the accuracy of the multi-atlasbased segmentation is the label fusion assumption of one-to-one correspondences between targets and atlas voxels. To improve the performance of brain image segmentation, label fusion approaches include spatial and intensity information by using voxel-wise weighted voting strategies. Although the weights are assessed for a predefined atlas set, they are not very efficient for labeling intricate structures since most tissue shapes are not uniformly distributed in the images. This paper proposes a methodology of voxel-wise feature extraction based on the linear combination of patch intensities. As far as we are concerned, this is the first attempt to locally learn the features by maximizing the centered kernel alignment function. Our methodology aims to build discriminative representations, deal with complex structures, and reduce the image artifacts. The result is an enhanced patch-based segmentation of brain images. For validation, the proposed brain image segmentation approach is compared against Bayesian-based and patch-wise label fusion on three different brain image datasets. In terms of the determined Dice similarity index, our proposal shows the highest segmentation accuracy (90.3% on average); it presents sufficient artifact robustness, and provides suitable repeatability of the segmentation results

    Relevant kinematic feature selection to support human action recognition in MoCap data

    No full text
    This paper presents a feature selection comparison oriented to human action recognition only with the kinematic features of skeleton representation. For this purpose, three relevance methods are used to rank the contribution of kinematic features for classifying an action is employed. Particularly, the method with the best results includes the supervised information regarding the action to find out a relevant set of features, encoding the most discriminative information. Experimental results are obtained on a well-known public data (MSR Action3D). Results are encouraging to use kernel theory methods to get better kinematic feature selection for each action with a good generalization indistinct to the subjec

    GMM background modeling using divergence-based weight updating

    No full text
    Background modeling is a core task of video-based surveillance systems used to facilitate the online analysis of real-world scenes. Nowadays, GMM-based background modeling approaches are widely used, and several versions have been proposed to improve the original one proposed by Stauffer and Grimson. Nonetheless, the cost function employed to update the GMM weight parameters has not received major changes and is still set by means of a single binary reference, which mostly leads to noisy foreground masks when the ownership of a pixel to the background model is uncertain. To cope with this issue, we propose a cost function based on Euclidean divergence, providing nonlinear smoothness to the background modeling process. Achieved results over well-known datasets show that the proposed cost function supports the foreground/background discrimination, reducing the number of false positives, especially, in highly dynamical scenario

    A similarity indicator for differentiating kinematic performance between qualified tennis players

    No full text
    This paper presents a data-driven approach to estimate the kinematic performance of tennis players, using kernels to extract a dynamic model of each player from motion capture (MoCap) data. Thus, a metric is introduced in the Reproducing Kernel Hilbert Space in order to compare the similarity between models so that the built kernel enhances groups separability: the baseline reference group and the group including players developing their skills. Validation is carried out on a specially constructed database that contains two main testing actions: serve and forehand strokes (carried out on a tennis court). Besides, the classical kinematic analysis is used to compare our kernel-based approach. Results show that our approach allows better representing the performance for each player regarding the ideal grou
    corecore